A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units

نویسنده

  • Yunxin Zhao
چکیده

AbsfructThis paper describes a large vocabulary, speakerindependent, continuous speech recognition system which is based on hidden Markov modeling (HMM) of phoneme-sized acoustic units using continuous mixture Gaussian densities. A bottom-up merging algorithm is developed for estimating the parameters of the mixture Gaussian densities, where the resultant number of mixture components is proportional to both the sample size and dispersion of training data. A compression procedure is developed to construct a word transcription dictionary from the acoustic-phonetic labels of sentence utterances. A modified word-pair grammar using context-sensitive grammatical parts is incorporated to constrain task dimculty. The Viterbi beam search is used for decoding. The segmental K-means algorithm is implemented as a baseline for evaluating the bottom-up merging technique. The system has been evaluated on the TIMIT database (1990) for a vocabulary size of 853. For test set perplexities of 24, 104, and 853, the decoding word accuracies are 90.9%, 86.0%, and 62.9%, respectively. For the perplexity of 104, the decoding accuracy achieved by using the merging algorithm is 4.1 % higher than that using the segmental K-means (22.8% error reduction), and the decoding accuracy using the compressed dictionary is 3.0% higher than that using a standard dictionary (18.1% error reduction).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Independent Phoneme Classification in Continuous Speech

This paper examines statistical models for phoneme classification. We compare the performance of our phoneme classification system using Gaussian mixture (GMM) phoneme models with systems using hidden Markov phoneme models (HMM). Measurements show that our model’s performance is comparable with HMM models in context independent phoneme classification.

متن کامل

Speaker, Vocabulary and Context Independent Word Spotting System for Continuous Speech

Word spotting is a widely known subject in continuous speech recognition and the traditional approaches use either hidden Markov models (HMM) or Gaussian mixture models (GMM). In this paper, we propose a different approach based on language independent phoneme modeling. The proposed system is speaker and vocabulary independent, and it is easy to implement. An equal error rate (EER) of 3.34% and...

متن کامل

Speech Recognition Using Monophone and Triphone Based Continuous Density Hidden Markov Models

Speech Recognition is a process of transcribing speech to text. Phoneme based modeling is used where in each phoneme is represented by Continuous Density Hidden Markov Model. Mel Frequency Cepstral Coefficients (MFCC) are extracted from speech signal, delta and double-delta features representing the temporal rate of change of features are added which considerably improves the recognition accura...

متن کامل

Restructuring Gaussian mixture density functions in speaker-independent acoustic models

In continuous speech recognition featuring hidden Markov model (HMM), word N-gram and time-synchronous beam search, a local modeling mismatch in the HMM will often cause the recognition performance to degrade. To cope with this problem, this paper proposes a method of restructuring Gaussian mixture pdfs in a pre-trained speaker-independent HMM based on speech data. In this method, mixture compo...

متن کامل

Phoneme Based Acoustics Keyword Spotting in Informal Continuous Speech

This paper describes several ways of keywords spotting (KWS), based on Gaussian mixture (GM) hidden Markov modelling (HMM). Context-independent and dependent phoneme models are used in our system. The system was trained and evaluated on informal continuous speech. We used different complexities of KWS recognition networks and different types of phoneme models. The impact of these parameters on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 1  شماره 

صفحات  -

تاریخ انتشار 1993